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Abstract 

Bertoin and Le Gall [T] introduced a certain probability measure 
valued Markov process that describes the evolution of a population, 
such that a sample from this population would exhibit a genealogy 
given by the so-called A-coalescent, or coalescent with multiple colli- 
sions, introduced independently by Pitman [2j and Sagitov [TU]. We 
show how this process can be extended to the case where lineages can 
experience mutations. Regenerative compositions enter naturally into 
this model, which is somewhat surprising, considering a negative result 
by Mohle [7]. 

AMS 2000 Subject classification: 60G09, 60G57, 92D25. 
Keywords: population model, coalescent, mutations, exchangeability, sam- 
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1 Introduction 

A coalescent with multiple collisions, or A-coalescent, 11 = (Jlt)t>o, is a 
Markov process on the space ^(N), the partitions ofN = {l,2,...}, such 
that for all n, n^"'^ the restriction of H to [n] = {!,... ,n}, is a Markov 

(n) 

process with the following transitions: If has b blocks, then any col- 
lection of k blocks, coalesce into one block at rate Xb^k for 2 < k < b < n. 
Note that the rate only depends on the number of blocks, not their sizes. 
By considering H^") and n^""*"^) one realizes that Xb^k = Xb+i^k + -^fe+i.fc-i-i- 
From this it follows, see [9], that 

Xbk= [ x'^-^Cl - x)''-''Aidx) lor 2 <k<b, 
J[o,i] 

for some finite measure A on [0, 1]. 
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A-coalescents were introduced independently by Pitman [9] and Sagitov 
[10| as a generalization of the Kingman coalescent [1] . All these processes can 
arise as limiting processes when studying the genealogy for a finite sample 
of individuals from a haploid (one parent per child) population, see |1H 
Proposition 7]. 

Consider a large population with constant size N for all generations, 
which are furthermore non-overlapping. A sample of n individuals, labeled 
by [n] , form a partition by grouping together those who have had a common 
ancestor by generation t backwards in time, i.e. those whose lineages have 
coalesced into a common lineage by that time. The A-coalescent, restricted 
to [n], is a possible limiting process when N ^ oo, and time and the dis- 
tribution of the number of children of each individual are scaled properly. 
Furthermore, it is possible to obtain a coalescent with simultaneous multiple 
collisions, but we will in this paper not consider such so-called H-coalescents, 
see [SI [H] for more details. 

The Kingman coalescent is a A-coalescent with A = 6o, i.e. with the 
only type of transition being a merger of two blocks at a time. This process 
is the natural limiting process for many population models, roughly speak- 
ing those models where the number of children for each individual always is 
small compared to the total population size as the size tends to infinity. The 
probability of more than two lineages in the sample coalescing at the same 
time is then negligible in the limit. A A-coalescent with A ^ 6q corresponds 
to a population where occasionally single individuals have offspring consti- 
tuting a positive fraction of the entire next generation as the population 
size tends to infinity. If several of the lineages in your sample belong to that 
fraction, they will coalesce into a single lineage at that moment. 

The main result of this paper is a description of the dynamics of the 
whole population when all lineages experience neutral mutations, i.e. mu- 
tations that do not influence the chance of survival. Earlier studies have 
only described how introducing mutations influences the dynamics of the 
genealogy of a sample from the population. 

We will proceed as follows. First, in Section [21 we will acquaint ourselves 
with useful representations of random partitions and coalescent processes. 
Here we will also flnd a description of a population model such that the 
genealogy of a sample from this population is given by the A-coalescent. 
When we introduce mutations in the population, an obvious way of parti- 
tioning a sample of individuals is by their common genotypes. In Section [3l 
a general recursion formula is given for the distribution of the family sizes 
in the sample. Section U] might at flrst be seen as a detour into the theory of 
regenerative composition structures, i.e. a special kind of ordered partitions, 
especially since it is known that our type of partitions can never appear from 
these regenerative composition structures if one simply disregards the order 
in the composition. This theory, however, is used in the last Section [5l in 
which we present a model for the whole population, and not just a sample 
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from it, when all lineages experience neutral mutations, such that the dis- 
tribution of a sample from this population is in accordance with the result 
in Section [3l 

2 Paintboxes and a population process 

In this section we will see how one can use probability measures to construct 
random partitions of N. Let p be a probability measure with atoms of sizes 
b = (6i)igN in non-increasing order. Let be an i.i.d. sample from p 

and define the equivalence relation ~p on N by i ~p j if Ri = Rj. Then the 
equivalence classes of ~p form the sought partition of N. 

A more common way of using such a sequence b, is to partition [0, 1] 
into intervals (/i)jgNo with lengths (6i)igNo5 where bo := 1 — let 
the equivalence relation ~b on N be defined by i ~b j ^iyi,yj G Ik for some 
A; > 1, where (l^)jGN are i.i.d. C/(0, 1), and let i such that Vi E Jq be in 
equivalence classes of their own. In general, one can use a random measure 
TT, and carry out the construction pointwise, given ir = p. This is equivalent 
to using atoms of random sizes /3 = (/3i)jgN, and the construction is called 
a paintbox construction, see [1]. 

A random partition of N is called exchangeable if its restriction to [n] 
has a distribution that is invariant under permutations of the labels [n] for 
all n G N. For example, if {Ilt)t>o is a A-coalescent, then Ilf is exchangeable 
for all t. Kingman has shown, see e.g. [1], that any exchangeable random 
partition of N can be obtained from a paintbox construction, e.g. with f3 = 
(/3j)jgN being the almost sure limit of {li{n)/n)i^fq, where li{n) is the size of 
the ith. largest block of the partition restricted to [n]. Here, and elsewhere 
in this paper, we understand limits to be taken as n — > oo, unless otherwise 
indicated. 

If one enumerates the blocks of 11^ = {Af, A|, . . . }, and for t > s consid- 
ers the blocks of Ut = {A\, A2, ■ ■ ■}, then each Aj = U .^s,t for some 

C^' . It is a property of coalescent processes that the partition II'''* = 
{C^'*, C2'*, . . . } is exchangeable and distributed as Ht-s- Bertoin and Le 
Gall m showed that there exists a collection {TTs,t)s<t of random probability 
measures on [0, 1] such that ^-ns^t corresponds to 11*'* for all s < t. For fixed 
s and increasing t this collection describes the genealogy of the population 
further and further backwards in time. The random partition 11'^'* should 
be interpreted as describing how the lineages present in the population at 
time s coalesce into lineages at time t, and it thus has a meaning even with 
negative arguments s < t, since this corresponds to future events in the 
population relative to time zero. We shall study, and later extend, the dy- 
namics of the Markov process p = {pt)t>o '■= i'^-tfl)t>o, which describes the 
evolution of the population forwards in time. Heuristically, pt{dr) represents 
the descendants at time t of the fraction dr of the population at time zero. 
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If A(0) = 0, then the dynamics of p can be described by a measure 
v{dx) := A.{dx) / . Let {(rj, Xj, C/j)}jgN be a Poisson process on M x (0, 1] x 
(0, 1) with intensity measure dt ® v{dx) ® du. Assume for the moment 
\v\ := i^((0, 1]) < oo. We wih use this Poisson point process to construct 
the process p. We let (Ti)jgN5 which by the assumption is a homogeneous 
Poisson process with rate be the jump times of p. At a time Tj, the 
conditional law of Pn-, given Pn--, is 

{l-X,)pr,^+Xi5R^, (1) 

where Xi has distribution y/]^] and Ri is a sample from Pn--, picked by the 
inverse transformation method: Ri := inf(r : Pt--_([0, r]) > C/j). Between 
jumps, p remains constant. 

The heuristic interpretation of these dynamics is that a person is chosen 
from the population just before the jump at time Tj, and she is identified 
with her family labeled Ri. At the time of the jump, she begets offspring 
of proportion Xi of the total population, and we say that the litter i, born 
at Tj, has size Xi. The rest of the population must thus be scaled down by 
a factor 1 — Xi and the atom corresponding to her family is increased with 
mass Xi. 

Bertoin and Le Gall [TJ Corollary 2] showed the following. 

Proposition 1 // (t'n)neN is a sequence of finite measures, with An{dx) = 
x'^Un{dx) converging weakly to a finite measure A on [0, 1], then the sequence 
of processes {p'^)neN) where each is governed by the respective Un, con- 
verges in distribution, in the sense of weak convergence of finite- dimensional 
marginals, to the process p corresponding to the collection {'Ks,t)s<t associated 
with the limiting A-coalescent. If A{0) = A(l) = and J^^ -^^-^ xi'(dx) < oo, 
then the convergence can be strengthened to almost sure convergence. 

Thus, p has a meaning even in the case |;/| = oo, and in particular, the 
description of p above, for the case |z^| < oo, can be extended to the case 

/(o,i) ^^(^^) < 

3 Mutations in the sample 

In a population genetics setting, it is natural to introduce mutations along 
the lineages and ask how the individuals of your sample are partitioned into 
different families according to their genotype. We assume that mutations 
always give rise to new types of individuals never seen before in the popula- 
tion (the so-called infinite alleles model) , and that when tracing the lineages 
backwards in time, there is a constant intensity p per lineage for a mutation 
to occur, i.e. if we draw the family tree of the sample, the mutations con- 
stitute a homogeneous Poisson process with intensity p along each branch, 
see Figure [T] for an example. 
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Figure 1: A family tree for a sample of 7 individuals, to is present time. 
Chronological time increases to the right, whereas the time of the coalescent 
increases to the left. Coalescence of lineages occurs at times ci, . . . , C4. Mu- 
tations are denoted by x and occur at times mi, 772-2 and m^. The partition 
of this sample into families is {{1, 2, 3, 5, 6}, {4}, {7}} . 

A quantity of interest is Q'(a), a = (ai, 02, . . . ), the probability of observ- 
ing a partition with Cj families of size i. When we trace the genealogy of 
the lineages backwards in time, the probability of a mutation to occur first 
is fj,n/{fm + A„), and the probability of a collision of k lineages happening 
first is (^) An,fc/(/in + A^), for /c = 2, . . . , n. By the Markov property of the 
A-coalescent, we can condition on the type of event that happens first, and 
obtain a recursion for g(a). Mohle [6] was the first to provide this recursion 
for A- (and even H-) coalescents. Let e/; be the fcth unit vector in and 

An := J2k=2 ^n,k- 

Proposition 2 (Mohle's recursion [6]) (?(ei) = 1, and 

n /n\ \ n-k+l ., . s 

(2) 

for n = ^ ■ iai > 1, where q{a.) = if any Oj < 0. 

There are no known closed formulas solving ([2]) for general A, except for the 
cases A = (^0 (Kingman's coalescent) and A = 61. 

The parts of this formula should be interpreted as follows. If a mutation 
occurs first, the rest of the sample is described by a — ei. If a merger of 
k lineages occurs first, and it occurs in a family represented by j + A; — 1 
lineages, then after that merger, the sample will consist of n — A; + 1 lineages 
and be described by a + ej — ej_|_fc_i. In particular, there will now be aj + 1 
families of size j. The probability that the merger of k lineages affected a 
family of size j + k — 1 is given by j{aj + l) /{n — k+1), since the merger could 
have resulted in any of the j lineages in any of the aj + 1 families with equal 
probability by the exchangeability. We refer to Mohle [6] and Dong et al. [2] 
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for more detailed discussions. The latter paper extends coalescent processes, 
so that their blocks become frozen when they encounter a mutation, and 
then do not partake in the further evolution of the process. With frozen 
blocks enclosed by ( and ), the path of this process, realized as in Figure [H 
would be 

{{1}, . . . , {7}} ^ {{!}, . . . , {6}, (7)} % {{!}, {2, 5}, {3}, {4}, {6}, (7)} 
^ {{!}, {2, 5}, {3}, (4), {6}, (7)} 3 {{1, 3, 6}, {2, 5}, (4), (7)} 
^ {{1, 2, 3, 5, 6}, (4), (7) } ^ { (1, 2, 3, 5, 6), (4), (7) } . 

The partition into families is obtained when all blocks are frozen. 



4 Regenerative composition structures 

Most results in this section are from Gnedin and Pitman [3]. A partition 
of n G N is an unordered collection of natural numbers {rii, . . . ,ni.} such 
that ni + • • • + rifc = n. An ordered partition is called a composition, and we 
say that ni,. . . ,nk are its parts. A composition structure ^ is a sequence 
{Cn)nG'M of random compositions of n such that if n balls are distributed 
into an ordered series of boxes according to C„, then C„_i is obtained by 
discarding one of the balls picked uniformly at random, and deleting an 
empty box in case one is created. A composition structure is regenerative if 
for all n > m > 1, given that the first part is m, the remaining composition 
of n — m is distributed as Cn-m- 

We will see that one can obtain regenerative compositions with the ap- 
propriate sampling procedure. Let (Vi)jgN be i.i.d. U{0, 1) and let (Vi„,)jg[„] 
be the ordered sample of (l^i)jG[n]5 nieaning Vi„ < • • • < Vnn- Given a closed 
set S C [0, 1], we can construct a composition C„ as follows. Partition [n] 
into blocks of consecutive integers by letting j and j + 1 be in different 
blocks if [Vjn, Vj+i,n.] n 5 7^ 0. Let the parts of C„ be given by the sizes of 
the blocks in increasing order of their elements, see Figure [2j We will in 
general also allow a random closed set S C [0,1], independent of (Vi)jgN, 
where the construction is carried out given the realization S = S. We then 
say that Cn is obtained by sampling from S. 

For any closed set S C [0, 1] and z G [0, 1), define D{S, z) := inf 5n(2;, 1], 
where we let inf := 1. For S and z such that D{S, z) < 1, define 

This is the part of S strictly to the right of D{S, z) scaled back to [0, 1], see 
Figured! We say that a random closed set 5 C [0, 1] is multiplicatively regen- 
erative if for each z € [0, 1), given D{S,z) < 1, the set 5^^-* is independent 
of [0, D(S, z)] n S, and has the same distribution as S. 
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Figure 2: An illustration of sampling with Vi, . . . jVj from S resulting in 
(ni, n2, ns, 714) = (1,1,2,3), and how to construct 5^^^ from 5. 

Let {{Ti, Xi)}i(zfq be a Poisson process on R_|_ x (0, 1) with intensity mea- 
sure dt (8) u{dx) for a measure u with J^^ xi'{dx) < 00, and let /i > be a 
constant. The notation here is intensionally similar to the one in the pre- 
vious sections of this paper, but we assume for the moment no relation to 
these. We call the process Z = {Zt)t>o a multiplicative subordinator with 
characteristics (/i, v) if 

Zt:=l-e-^'' H (l-Xi), 

i:Ti<t 

for all t > 0. The name is justified by the property that (1 — Zfi)/{1 — Zt) 
has the same distribution as 1 — Zf^t and is independent of {Zu)o<u<t for 
t' > t. We obtain an ordinary subordinator by the transformation Zt 1-^ 
-log(l-Zt). 

Remark 1 We need the moment condition on above to obtain a non- 
trivial process Z, since if J^q xv{dx) = 00, then X]j.T--<t log(l — Xi) = —00 
almost surely for t > 0, see Campbell's Theorem in O p. 28], and thus 
U^■.r,<ti^-Xi) = Oior alH>0. 

Let TZ be the closed range of the multiplicative subordinator Z. Propo- 
sition [3] collects some results of [3] . 

Proposition 3 The closed range TZ of Z is multiplicatively regenerative, 
and conversely, all multiplicatively regenerative sets can be seen as the range 
of some multiplicative subordinator, whose characteristics are determined up 
to a positive constant. Sampling from TZ produces a regenerative composition 
structure , and all regenerative composition structures can be obtained by 
sampling from a regenerative set. 

Since we have these relations between regenerative composition structures, 
multiplicative subordinators, and multiplicatively regenerative sets, we also 
say that (//, v) are the characteristics of the regenerative composition struc- 
ture ^ of the proposition. In particular, the probability of the first part 
having size m in C„, is q{n : m) = $(n : m)/^{n), where 

$(n : m) = fxnl{m = 1) + (^^^ x'^(l - x^-^^uidx), (3) 
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and <l>(n) = X]m=i ^('^ • that the characteristics (/i, z^) and 

{cfi,cu),c > 0, produce the same regenerative composition structure. We 
will need more detailed results about the first part of a regenerative com- 
position Cn,n > 2. It can have size one if either Vin € TZ, or Vi„ ^ TZ and 
TZn [Vin, V2n] ^ 0- The expressions for the following probabilities are taken 
from the proof of Theorem 5.2 on p. 457 of [3]. 

q{n:iy:=P{V,nen) = £^ (4) 

q{n : 1)" := P{Vin ^ 7^, [Vm, ^2n] n 7^ / 0) = -— / x(l - x)"-V(dx). 

^(."'J Jo 

(5) 

Mohle [3 Theorem 3.1] showed 

Proposition 4 T/ie recursion ([2]) cannot be solved by a partition obtained 
by disregarding the order of the parts of a regenerative composition structure, 
unless A has all its mass in either or 1. 



5 Mutations in the population 

For a population without mutations, pt — > <5e in distribution as t — > (X), where 
e has distribution f/[0, 1] and is called the primitive Eve [H Proposition 1 
and Definition 4], so that all of the population belongs to the primitive 
Eve's family. This is a sort of genetic drift where, by chance, some genotype 
eventually makes up the whole population. When mutations are possible, 
no such absorbing state exists since new mutations appear, and we can hope 
for the existence of a non-trivial stationary distribution of p. 

We shall now investigate what happens with p, describing the evolution 
of the population forwards in time, when individual lineages mutate at con- 
stant rate p. The heuristic interpretation will be that a constant mutation 
rate p erodes all families at the same rate. The mutated lineages are unique 
and each one only takes up an infinitesimal fraction of the whole population 
until they possibly increase their size to a positive fraction of the population 
by a jump. They could also experience yet another mutation but that does 
not matter in this setting since we are not interested in the actual genotypes; 
all that matters is that they differ. In the case with finite intensity of births 
of new litters, the jump mechanism will be the same as in ([T]), but for t 
between two consecutive jump times, say a and r, we will have 

p, = e-'^(*-)p, + (l-e-^(*-^))A, (6) 

where A is the Lebesgue measure on [0, 1]. 

To make this rigorous, we will proceed in several steps. We will first study 
the litters without any genealogical relationships. Since a family consists of 
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litters, claiming that it erodes at a constant rate, implies that its litters 
also must do so at the same rate. We will describe the composition of a 
population consisting of eroding litters and "mutants", or singletons, with 
a probability measure on [0, oo). The process describing the evolution of 
the population, still disregarding possible family ties between litters, will 
then be shown to converge to a stationary distribution. After that, we 
will impose a genealogy on the litters, meaning a partial order describing 
who is a descendant of whom. This will enable us to define p as we want. 
Finally, Theorem [2] validates our construction by stating that a sample from 
this population would have the same sampling distribution as from a A- 
coalescent with mutations. 

We will for the rest of this section assume that A(0) = A(l) = and 
/(o 1) < oo. We let ^{dx) := A{dx)/x'^, as in Section [2l and thus 

J^Q xu{dx) < oo. Let {(rj, Xj, C/j)}jgN be a Poisson process on M x (0, 1) x 
(0, 1) with intensity measure dt®i'{dx) ®du, whose points denote the times 
of birth, and the sizes of the litters born in the population, and auxiliary 
random variables for each litter to be used later. The litters are indexed in 
decreasing order of size, and in case of ties in decreasing order of the auxiliary 
random variables. Thus i < j need not imply Tj < Tj. The dynamics in ([T]), 
when there are no mutations, imply that the fraction of the population at 
time t that belongs to a litter i, born at time Tj < t, with original size Xj, 
is given by 

j.Ti<Tj<t 

since the size of the litter must be scaled down at the birth of each subsequent 
litter. If the size of each litter, and thus also the size of each family, is 
furthermore eroded with a constant rate the size at time t of litter i 
becomes 

:= X,e-^(*--») W {l-Xj)l{Ti<t), 

j.Ti<Tj<t 

where l(-) is the indicator function. We can describe the sizes and the 
ages of the litters at time t with the random probability measure pt on ]R_|_, 
defined by its distribution function 

F{s:pt):=l-e-^^' J] {1 - X^), 

i:t—s<Ti<t 

for s > and F{s : pt) := for s < 0. The atoms of pt now have sizes Xi[t) 
and positions t — Ti, provided Tj < t, corresponding to the current sizes and 
ages at time t of the litters. By the homogeneity of the Poisson process, 
P = {pt)teM. is a stationary process. Since the process depends on all litters 
born before time t, it does not describe the composition of the population 
into litters if we want the process to start at time with no litters. In that 
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case we must use a cut-off, so that there are no htters older than t at time t. 
This is described by the process p' = {pt)t>o of random probabihty measures 
on M+, with distribution functions 

i:max{t—s,0)<Ti <t 

for s > and F{s : p[) := for s < 0, so that p'oids) = pe~^^ds. This 
process is not stationary, but it converges in distribution to pQ. 

Lemma 1 po, as t —> oo. 

Proof Define p", t > 0, by its distribution function 

Fis:p'l):=l-e--^^ H (1 - X,), 

j:max(— s,— t)<ri<0 

for s > and F(s : p") := for s < 0. By the homogeneity of the Poisson 

process, F{s : p[) = F{s : p'l), or, equivalently, p[ = p", for all fixed t > 0. 
Note that F{s : po) = F{s : p'l) for s < t. For s > t, we have 

0<F{s: Po) - F{s : p'l) < e"^^ < e"^* ^ 0, 

as i — > cxD. Thus, the distance between p'l and po in the total variation 
metric, sup^gg(]g) \pl{A) — /5o(A)| < e~'^* ^ 0, as i — > cx3. (Here B{R.) are 

the Borel sets of M.) This implies p'^ pQ. □ 
Thus p and p' have the same limiting distribution, and we choose to 
work with the former process, since it is stationary. 

Theorem 1 The composition of a sample from pQ, according to litters of 
increasing age, is regenerative with characteristics {p,iy). 

Proof By construction, F(s : po) is a multiplicative subordinator with char- 
acteristics (p, v), and the order of the parts of the regenerative composition 
obtained by sampling from the closure of its range corresponds to increasing 
age of the litters. □ 

In the light of Proposition HI the theorem might be a bit surprising. 
What we really want is not the composition into litters, but the partition 
into families, so we must somehow collect different litters into families. This 
will destroy the regenerative property of the composition into litters. 

We will now define how the litters are related to each other. We do 
this by sampling from p. Let TZt be the closed range of F{s : pt). The 
complement of TZt in [0, 1] is a union of disjoint open intervals, L)ili,t, with 

Ii,t := iF{{t-Ti)-:pt),F{t-n:pt)), 
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Figure 3: Top An illustration of j -<' i. The arrow indicates how Ui € 



BOTTOM An example of how the litters 1, . . . , 5 (an arbitrary enumeration) 
can be related to each other. Here 5 ^' 4, 5 3 ^' 2 and both 1 and 5 are 
roots. 



so that interval li^t corresponds to litter i. Note that litters i with Tj > t, 
i.e. litters not yet born at time t, have li^t = 0- We also have 7^^"^ = T^r— 
if u G li^t, see Figure El 

We say that litter i originates from litter j if Ui € Ij,Ti~, and in that 
case we write j i, see Figure [3l There is for each i at most one j such 
that j ^' i. There is no such j if Ui G Tin-- Let Xq := {i '■ ^' i}- 

This is the set of litters which are not descendants from any other litter, 
but descendants from singletons, thus their genotypes are unique at their 
times of births. We call these litters roots. We define -< by j ^ i if there 
exist ki, . . . ,kn such that j -<' ki -<'■■■ ^' kn = i- The sequence ki, . . . ,kn 
is then unique. Furthermore, we set j ^ i if j ~< i ot i = j. There can be 
at most one root j for each i such that j ^ i, and in that case we write 
a{i) = j, and say that j is the root of i. What is not immediately obvious 
is that each i € N has a root (almost surely). 

Lemma 2 Each i G N has a root almost surely. 

Proof Define recursively In '■= {i ■ 3j G In-i,j i}, for n > 1, and let 
the height of a fixed litter i be defined hy Hi := n if i £ In and Hi = oo if 
$n £ N : i € In- We need to show that Hi is finite almost surely. 

The height Hi is a function of {(r^, X^, ?7fc)}fc;r(.<Ti- The event {i -<' 1} is 
likewise measurable with respect to {{Tk,Xk, Uk)}k:Ti<Tk<Tr Thus the events 
{Hi = h} and {i -<' Z} are independent for all i,l : Ti < ti, and h G NU {oo}. 

Let ph ■= P{Hi > h) for /i > 0. Obviously po = 1- Assume h>\. Note 
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that {Hi > 1} = {U, i Tlr^J) = \Jy.r,<rA2 SO 

Vh= P(.H,>h,j ^'i)= J2 P{Hj>h-l,j^'i) 

j:Tj<Ti j-Tj<Ti 

j-Tj<Ti j-rj<Ti 

= Ph-lP{u^ i 7^.,_) = Ph-i{i - q{i ■■ I)')- 

In the last equahty we used with n = 1. Thus ph = {1 — q{l ■ 1)')'^ ^'^d 
therefore Hi has a geometric distribution with parameter q(l : 1)' and is 
finite almost surely. □ 
By our interpretation of the relation ^ as a genealogical relation, we 
should let all litters with the same root have the same genotype. We define 
Ri, the genotype of litter i, by Ri := Ua(i)- Now we can finally define 
P = {Pt)tm- 

i i 

Note that this is a stationary version, and po ^ A. In the finite intensity 
case, p behaves as in ^ between jumps, just as we wanted, and at the time 
of a jump, the new litter chooses its genotype from the population at the 
moment before the jump, just as in ([T]). 

At a fixed time t, pt represents the population in the sense that a sample 
from the population will have a partition with distribution as given by ~pj . 
An i.i.d. sample {ri)i^[n] from a realization of pt can be interpreted as the 
genotypes of individuals i = 1, . . . , n in a sample from the population a time 
t. The value of an r with distribution pt can either be one of the Ri, . . . , 
or, with probability 1 — ^iXi(t), it is uniformly drawn from [0, 1]. 

The justification for the construction is given by Theorem [2j 

Theorem 2 The partition of a sample from po according to families has the 
same distribution as the partition according to genotypes of a sample from 
a A-coalescent with mutations, i.e. its distribution is given by the recursion 

m- 

Proof We assume the sample size n > 2 and that the sample is created by 
first sampling from TZq with the i.i.d. uniform random variables (l^i)iG[n]) 
and then collecting the litters into families. Note that {Vin)i=j...ni when 
disregarding their order, are i.i.d. U{v,l), given Vjn > v. We will use the 
notation from Section HI Consider the realization Vin = v. Three possibili- 
ties exist. 

1. V £ IZq. This happens with probability q{n : 1)'. Then 1 is a singleton 
and is thus in a family of its own. 
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2. Vin ^ TZq and [Vi„,V2n] H T^o 7^ 0. This happens with probabihty 
q{n : 1)". Then 1 is in a htter of its own, say htter k, and what family 
htter k belongs to is determined by a uniform random variable Uk on 

3. [Vin, Vmn]<^T^o = and either m = n,oic2<m<n and [Vmn, ^+i,n]n 
TZq 7^ 0. Then m individuals belong to the same litter, say litter k. 
This happens with probability q{n : m). What family this litter be- 

longs to is determined by a uniform random variable on TC^ = "R-r^^. 

In cased!, we immediately find that the first part of our composition has size 
1. The distribution of the rest of the sample is determined by {Vin)i=2...n 

(v) 

and TZq . By the regenerative property, the distribution of the rest of the 
sample will be the same as sampled with {Vi)i^[n-i] from TZq. 

In case [2j, the lineages of the sample can be represented by and 
{Vin)i=2...n and their partition is obtained by sampling from T^q''^ = TZr^-, 
which by the regenerative property yields the same result in distribution as 
sampling with (Vi)jg[„] from T^o- 

In the third case, we know that the lineages represented by (Vin)jg[m] 
have coalesced since they originate from a common litter, say litter k, but 
we do not know to which family they belong. This is determined by the 

(v) 

realization of relative to T^q , which, if 2 < m < n, together with the re- 
alization of {Vin)i=m+i,...,n determines the further coalescing of lineages. As 
in case[2j, the distribution will be the same as if we sample with {Vi)i£[n-m+i] 
from TZq. 

The argument is now similar to the one in Mohle [6] and Dong et al. [2], 
with the main difference that our caseO above does not add any information 
about the final partition, whereas they only have either mutations/freezing 
(our cased)) or collisions (our caseO) happening at each stage. We thus 
get the recursion 

g(a) = q{n : l)'g(a - ei) + q{n : l)"g(a) 

n n—m+l ., ^\ 

q[n:m) > — g(a + - ej+„_i , 

^ n—m+l 

m=2 j=l 

or equivalently, by ([3]), (P and $(n : m) = {^Xn,m for 2 < m < 



Jo 

A„,m > — g a + ej -ej+„_i , 

\mj ^-^ n — m + l 



$(ra)g(a) = //ng(a — ei) + n / x{l — x)"" i'{dx)q{a) 

) 

n—m+l 



m=2 ^ ' j=l 
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Figure 4: Illustration of sampling with iyi)i^[7] from the regenerative set 
that corresponds to pQ. The arrows indicate how the litters are related to 
each other. Compare with Figures [2] and El 



and since $(n) - n x(l - x)" ^u{dx) = fin + Ylm=2 (m)^"."* = + ^n, 
we arrive at ([2]), and the proof is complete. □ 

We illustrate the procedure of the proof with Figure [H The procedure 
amounts to moving from left to right and note how the arrows hit TZq or its 
complement. First, V4 is alone in its interval, corresponding to case [2j in 
the proof. At this point we cannot say anything about the final partition 
since that litter may be related to the other individuals in our sample. Next, 

hits TZq so that it is in a family of its own (case[TJ). The next event is 
that both V2 and V5 fall in the same interval, corresponding to a merger of 
their lineages and caseO in the proof. After that, we have a case[2j for the 
lineage of 2 and 5. Next, we find that the litter of individual 4 is a root. 
Then lineages 1, 3 and 6 coalesce. The penultimate event is that the litters 
of lineages 2 and 5, and 1, 3 and 6, are related to the same litter, and thus 
these lineages coalesce. The final event is finding that this litter also is a 
root. Thus the partition is 2, 3, 5, 6}, {4}, {7}} , just as the example of 
Figure [H The order of the collisions and mutations is also the same as in 
that example. 

Remark 2 Our construction of p requires u to he a measure on (0, 1) with 
/(o 1) ^^{^^) < This excludes a large class of A-coalescents. The mo- 
ment condition is necessary when we want to construct the multiplicative 
subordinator F{s : po) (whose properties we use repeatedly) from the point 
process {(rj, Xj)}jgN- Nevertheless, it might be possible to obtain a conver- 
gence result analogous to the one of Proposition [H but we have not been 
able to do so. 

Acknowledgment I thank the two referees for their thorough reading and 
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